Next: Example, Up: Wisent Grammar [Contents][Index]
To be acceptable by Wisent a context-free grammar must respect a particular format. That is, must be represented as an Emacs Lisp list of the form:
(terminals assocs .
non-terminals)
Is the list of terminal symbols used in the grammar.
Specify the associativity of terminals. It is
nil when there is no associativity defined, or
an alist of (assoc-type . assoc-value)
elements.
assoc-type must be one of the
default-prec, nonassoc,
left or right symbols. When
assoc-type is default-prec,
assoc-value must be nil or
t (the default). Otherwise it is a list of
tokens which must have been previously declared in
terminals.
For details, see (bison)Contextual Precedence, in the Bison manual.
Is the list of nonterminal definitions. Each definition has the form:
(nonterm . rules)
Where nonterm is the nonterminal symbol defined and rules the list of rules that describe this nonterminal. Each rule is a list:
(components [precedence]
[action])
Where:
Is a list of various terminals and nonterminals that are put together by this rule.
For example,
(exp ((exp ?+ exp)) ;; exp: exp '+' exp
) ;; ;
Says that two groupings of type ‘exp’, with a ‘+’ token in between, can be combined into a larger grouping of type ‘exp’.
By convention, a nonterminal symbol should be in lower
case, such as ‘exp’,
‘stmt’ or
‘declaration’. Terminal symbols
should be upper case to distinguish them from
nonterminals: for example,
‘INTEGER’,
‘IDENTIFIER’,
‘IF’ or
‘RETURN’. A terminal symbol that
represents a particular keyword in the language is
conventionally the same as that keyword converted to
upper case. The terminal symbol error is
reserved for error recovery.
Scattered among the components can be middle-rule actions. Usually only action is provided (see action).
If components in a rule is
nil, it means that the rule can match the
empty string. For example, here is how to define a
comma-separated sequence of zero or more
‘exp’ groupings:
(expseq (nil) ;; expseq: ;; empty
((expseq1)) ;; | expseq1
) ;; ;
(expseq1 ((exp)) ;; expseq1: exp
((expseq1 ?, exp)) ;; | expseq1 ',' exp
) ;; ;
Assign the rule the precedence of the given terminal item, overriding the precedence that would be deduced for it, that is the one of the last terminal in it. Notice that only terminals declared in assocs have a precedence level. The altered rule precedence then affects how conflicts involving that rule are resolved.
precedence is an optional vector of one terminal item.
Here is how precedence solves the problem
of unary minus. First, declare a precedence for a
fictitious terminal symbol named UMINUS.
There are no tokens of this type, but the symbol serves
to stand for its precedence:
… ((default-prec t) ;; This is the default (left '+' '-') (left '*') (left UMINUS))
Now the precedence of UMINUS can be used
in specific rules:
(exp … ;; exp: …
((exp ?- exp)) ;; | exp '-' exp
… ;; …
((?- exp) [UMINUS]) ;; | '-' exp %prec UMINUS
… ;; …
) ;; ;
If you forget to append [UMINUS] to the
rule for unary minus, Wisent silently assumes that minus
has its usual precedence. This kind of problem can be
tricky to debug, since one typically discovers the
mistake only by testing the code.
Using (default-prec nil) declaration
makes it easier to discover this kind of problem
systematically. It causes rules that lack a
precedence modifier to have no precedence,
even if the last terminal symbol mentioned in their
components has a declared precedence.
If (default-prec nil) is in effect, you
must specify precedence for all rules that
participate in precedence conflict resolution. Then you
will see any shift/reduce conflict until you tell Wisent
how to resolve it, either by changing your grammar or by
adding an explicit precedence. This will probably add
declarations to the grammar, but it helps to protect
against incorrect rule precedences.
The effect of (default-prec nil) can be
reversed by giving (default-prec t), which
is the default.
For more details, see (bison)Contextual Precedence, in the Bison manual.
It is important to understand that assocs
declarations defines associativity but also assign a
precedence level to terminals. All terminals declared in
the same left, right or
nonassoc association get the same precedence
level. The precedence level is increased at each new
association.
On the other hand, precedence explicitly assign the precedence level of the given terminal to a rule.
An action is an optional Emacs Lisp function call, like this:
(identity $1)
The result of an action determines the semantic value of a rule.
From an implementation standpoint, the function call will be embedded in a lambda expression, and several useful local variables will be defined:
$nWhere n is a positive integer. Like in
Bison, the value of $n is the
semantic value of the nth element of
components, starting from 1. It can be of
any Lisp data type.
$regionNWhere n is a positive integer. For each
$n variable defined there is
a corresponding $regionn
variable. Its value is a pair
(start-pos .
end-pos) that represent the start
and end positions (in the lexical input stream) of
the $n value. It can be
nil when the component positions are not
available, like for an empty string component for
example.
$regionIts value is the leftmost and rightmost positions
of input data matched by all components in
the rule. This is a pair
(leftmost-pos .
rightmost-pos). It can be
nil when components positions are not
available.
$ntermThis variable is initialized with the nonterminal symbol (nonterm) the rule belongs to. It could be useful to improve error reporting or debugging. It is also used to automatically provide incremental re-parse entry points for Semantic tags (see Wisent Semantic).
$actionThe value of $action is the symbolic
name of the current semantic action (see Debugging
actions).
When an action is not specified a default value is
supplied, it is (identity $1). This means
that the default semantic value of a rule is the value of
its first component. Excepted for a rule matching the
empty string, for which the default action is to return
nil.
Next: Example, Up: Wisent Grammar [Contents][Index]